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SUMMARY 


The'^three grid point resequencing algorithms^^^most often run by NASTRAN 
users are compared for their ability to 'ytiduce mattrix root-mean-square (rms) 
wavefront, which is the moat critical parameter in determining matrix decompo- 
sition time in NASTRAN. The three algorithms are Cuthill-McKee (CM) , Gibbs- 
Poole-Stockmeyer (GPS), and Levy. The first two (CM and GPS) are in the BANDIT 
program, ana the Levy algorithm is in WAVEFRONT. Results are presented for a 
diversified collection of 30 test pt.:'jlems ranging in size from 59 to 2680 
nodes. It is concluded that GPS is exceptionally fast and, for the conditions 
under which the test was made, the algorithm best able to reduce rms wavefront 
consistently well. 


INTAODLCTION 


A central feature of structural analysis with NASTRAN is the factoring (or 
decomposition) of a matrix into upper and lower triangular forms. NASTRAN 's 
current triangular decomposition algorithm is an active column routine similar 
to a variable band or wavefront approach. As such, the computer time required 
to perform a matrix decomposition depends strongly on the sequence assigned to 
the grid point labels. 


Vor real, symmetric decomposition, for example, the time T required can be 
estimated from the relation (ref. 1) 
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where 


N = matrix order, 

c^ = number of active columns in matrix row i, and 

T = time for multlply-add operation (an experimentally determined 
machine time constant) . 


The time T is sequence-dependent since the cj^'s are sequence-dependent. 


Since Cj|^ is sometimes referred to as the row wavefront for row i, equation 
(1) can alternatively be written in terras of the root-mean-square (rms) wave- 
front, 


rms 
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( 2 ) 
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^ T N W ^ 
2 m rms 
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Because of the large size of in equation (1) for many problems, this 

latter form of the timing equation is often the more convenient one to use in 
practice. For reference purposes, typical values for the machine constant Tjjj 
are listed in Table 1 for several computers. 


Core storage requirements for matrix decomposition are also dependent on 
the nodal sequence, the most critical parameter being the maximum matrix wave- 
front W^iaxj which is the maximum value of any c^, 

Th^, for most efficient matrix decomposition, the user would like to 
assign ^id point labels so as to minimize both and with the former 

probably the more important. Unfortunately, it is often difficult for users 
to know how to sequence the nodes to effect a good numbering, particularly for 
large complicated meshes or those generated automatically on a computer , As a 
result, many users turn to NASTRAN preprocessors which automate the labeling 
process. The two most often run by NASTRAN users are BANDIT (refs. 2-4), which 
contains the Cuthill-McKee (CM) (ref, 5) and Gibos-Poole-Stockmeyer (GPS) 

(refs. 6-7) schemes, and WAVEFRONT (ref. B) , which contains the Levy 
resequencing algorithm (refs. 9-10). Both preprocessors read NASTRAN data 
decks as input, resequence the nodes, and generate JS]^’fiRAN SEQGP bulk data 
cards (which tell NASTRAN what the new internal seqi,i^.?e should be) . 

The questions then naturally arise are: How do these three 

resequencing algor itfhq^( CM, GPS, and Levy) compare for their ability to reduce 
rms wavefront? What are^fche time and core requirements of the three 
algorithms? | J 

These questions were addressed recently in another paper (ref. 11), in 
which the algorithms were also compared for matrix profile reduction. Complete 
descriptions of the test problems used for the comparison were presented. The 
purpose of this paper, which is adapted from reference 11, is to sunmiarize for 
the NASTRAN user community the rms wavefront results obtained. 


Subsequent sections of this paper present precise definitions of the rele- 
vant terms, a brief description of the three algorithms to be uested, the 
ground rules of the test, and the test results. 


DEFINITIONS 


Although the definitions given here are reasonably standard (at least in 
finite element circles), uniformity of definitions and notation among the 
various workers in the field does not yet exist. 

Given a symmetric square matrix A of order N, we define a "row bandwidth" 
b^ for row i as the number of columns from the first nonzero in the row to the 
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diagonal, inclusive. Numerically, exceeds by unity the difference between i 
and the column index of the first nonzero entry of row i of A. Then the matrix 
bandwidth B and profile P are defined as 


B = 
F = 


max , 
i^N “l 

(3) 

N 


E b 
i=l ^ 

(4) 


Let Cj;^ denote the number of active columns in row i. By definition, a 
column j is active in row i if j > i and there is a nonzero entry in that 
column in any row with index k 5 i. The matrix wavefront W is then defined as 


- max 
i<N 


■"i 


(5) 


Sometimes c£ is referred to as the row wavefront for row i. Since the matrix A 
is symmetric, 


N N 

P = S b = E c (6) 

i=l i=l 


The wavefront W is sometimes called the maximum wavefront to distinguish 

it from the average wavefront and root-roean-square wavefront Wj^jjis defined 

as 
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From these definicions, it follows that, for a given matrix, 

w iw :5 b5n 

avg rms max 


(9) 


The first two inequalities would be equalities only for uninteresting special 
cases such as diagonal matrices. 


We define Che degree of node i as the number of other nodes to which it 
is connected; l.e,, more precisely, dj^ is the number of nonzero off-diagonal 
terms in row i of Che matrix A. (This implies, for example, that all nodes in 
the same finite element are "connected" to each other.) Hence, the maximum 
nodal degree M is 


M = 


max 

i<N 


d. 

1 


( 10 ) 
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The number ol: ui^lciue eclRea E is defined as the number of nonzero off-diagonal 
terms above the diagonal. Hence, for a symmetric matrix, 

1 N 

E = E d, (11) 

2 i.i ^ 

Thus the total number of nonzeros in A is 2E+N, and the density p of the matrix 
A is 


p = (2 E+N)/n2 (12) 

Note that, in these definitions, the diagonal entries of the matrix A are 
included in b^ and c£ (and hence in B, P, ''^avg> ^'^rms) • These 

definitions make it easy to convert the various parameters from one convention 
(including the diagonal) to the other (not Including the diagonal) . 

Also note that, in this context, the order N of the matrix A is sometimes 
taken to be the same as the number of nodes. In general finite element usage, 
however, each node (grid point) has several degrees of freedom (DOF) , not just 
one. For structures having, say, six DOF per node, the actual DOF values of B, 
^'^max> '^avg» ^'^rms be (in the absence of constraints) six times their 

corresponding grid point values. 


Example 


Definitions (3)- (12) can be illustrated by the following simple example. 
Consider the matrix shown below, in which nonzeros are indicated by X's. 
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E=18 


L 


E=18 E=64 E=12 


In each row and column a line is dravim from the first nonzero to the diagonal. 
Thus b.j_ is the number of columns traversed by the solid line in row i. 
Similarly, the number of active columns c^ in row i is the number of vertical 
lines in row i to the right of and including the diagonal. Thus, from the 
definitions, B=6, P=?=18, Wg^^g=3.0, Wj.p,g=3.3, M=3, E=6, and p= 50.0%. 
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THE RESEQUENCING ALGORITHMS TESTED 


Tlie three algorithms tested are Cuthill-McKee (CM) (ref, 5), Gibbs-Poole- 
Stockmeyer (GPS) (refs. 6-7), and Levy (refs. 9-10). In this section each 
algorithm is described briefly, with details concerning the specific implemen- 
tation used. It is recognized that one cannot really evaluate algorithms per 
se , but only specific implementations of algorithms. 

Cuthill-MclCee (CM) (ref. 5) 

The original version of CM operated generally according to the following 
procedure: Among the nodes of low degree, select as potential starting nodes 
those which can root a graph of minimal width. (The term "starting node" 
refers to a node which is assigned the label 1 in the new sequence.) For each 
potential starting node, assign the labels 2 tljrough N by numbering those 
adjacent to new label I (and unnumbered) in order of increasing degree, 
starting with 1=1 and continuing with Increasing I until all nodes are 
sequenced. Of the sequences attempted, select the one having t'*e smallest band- 
wid th. 

The implementation of CM used in these tests is that appearing in the 
BANDIT computer program, version 8 (refs. 2-4), which contains a version of CM 
differing from the original algorithm in two ways. Firut, the new sequence 
obtained is reversed (by setting I to N+l-I for each I) , since it was observed 
by George (ref. 12) and proved by Liu and Sherman (ref. 13) that such a 
reversal (which preserves matrix bandwidth) will often reduce the matrix 
profile and never increase it. Second, of all sequences attempted, the one 
with the smallest rms wavefront js the one selected. Except for these two 
changes, the CM computer code is that originally written by Cu thill and McKee. 

The data structure originally used by CM required about (M+8)N words of 
core storage for the problem-dependent arrays, where N is the number of grid 
points and M is the maximum nodal degree. In the BANDIT implementation of CM, 
word packing is used to reduce the storage requirements to (M/L+8)N, where L, 
the packing density, is an integer (between 2 and 6, inclusive) which depends 
on the problem size and the computer being used. On a CDC 6400, for example, 
the CP time penalty for packing is about 80 psec per pack or unpack. 

Gibbs-Poole-Stock tcyer (GPS) (refs. 6-7) 

The GPS algorithm differs from CM primarily in the selection of starting 
nodes. In GPS, only one starting node is selected, and it is an endpoint of a 
pseudo-diameter of the graph associated with the matrix. Thus, the structure 
need be numbered only once, using a procedure which is similar to the CM 
numbering algorithm. 

The storage requirements of GPS are identical to chose of CM, Including 
the use of integer packing in the BANDIT (version 8) implementation, which is 
the form of GPS used for the testing. The original GPS code was written by 
the developers (ref. 7). 


115 



Levy (refs. 9-10) 


Unlike CM and GPS, which were developed to reduce matrix bandwiath and 
profile, the Levy algorithm was designed specifically to reduce the maximum 
matrix wavefront, W,„ax* The algorithm operates generally according to the 
following recursive procedure: Given the first I nodes of a new sequence, the 

node selected as I+l is the one for which the increase in the row wavefront 
between rows 1 and I-H will be minimum. Levy calls this a "minimum growth" 
method , 

This procedure is followed for one or more trial starting nodes, and the 
sequence yielding the smallest wavefront is selected. The first sequence 

attempted uses as the starting node either a user-selected node or a node of 
minimum degree. The latter option was chosen for these testa since it was felt 
that, for a production mode program, the user ought to be relieved of the 
burden of specifying a starting node. The second and succeeding sequences 
attempted by the Levy algorithm select starting nodes randomly. The number of 
new sequences to be attempted must be specified by the user. After some 
preliminary experimentation to estimate the speed of the algorithm, it was 
decided to request ten sequencing attempts for each test problem. Clearly a 
different number would yield different results. 

The Implementation used for the tests was that obtained by the author from 
Levy in 1973, the only change being that the sequence selected as best is the 
one yielding the smallest rms wavefront . Since the Levy algorithm aborts 

any resequencing attempt in progress once it determines that it cannot reduce 
the previous best the sequence finally selected will be the one among 

those carried to completion yielding the smallest 

The Levy data structure requires about 6N+10E words of core storage for 
the problem-dependent arrays, where N is the number of grid points and E is the 
number of unique edges. The code was not rewritten to use word packing for the 
tests. 


TEST RESULTS AND DISCUSSION 


The three grid point resequencing algorithms described in the preceding 
section were tested on a collection of 30 finite element meshes. These 
problems were collected over a period of several years from NASTRAN users 
representing various U.S. Navy, Army, Air Force, and NASA laboratories. Since 
these meshes are described In detail and plotted elsewhere (ref. 11), that 
information need not be repeated here. In general, however, the collection is 
probably large enough and diversified enough to provide a good test of nodal 
resequencing algorithms. 

The nodes for the 30 test problems were resequenced using the three algo- 
rithms, the objective being to reduce rms wavefront. All computer runs were 
made on a CDC 6400 computer under the NOS/BE operating system. The source code 
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was compiled using the FTN compiler, 0PT=1. T?or reference purposes, a GDC 6400 
is about one-third as fast as a GDG 6600. 

The results of the tests appear in Table 2. In addition to the rms wave- 
front obtained by each algorithm, Table 2 also lists, for each algorithm, the 
CP time expended and the storage requirements for the problem-dependent arrays. 
In the case of CM and GPS, which use word packing, the worst-case of half-word 
packing is assumed. The CP times listed do not include basic setup of the 
arrays . 

The first conclusion to be dramr from Table 2 is that, for most problems, 
all three algorithms achieve about the same reduction in rms wavefront. This 
is, perhaps, somewhat unexpected since CM and GPS were designed primarily to 
reduce matrix bandwidth, whereas the Levy scheme was designed to reduce matrix 
wavefront. For the 30 problems. Levy achieved the best reduction in rms 
wavefront 13 times, GPS 11 times, aird CM 5 times. However, on four occasions 
(Ns503, N=607, N=878, and N=918) Levy did significantly worse than the best 
achieved; on three occasions (N=209, N=245, and N=1242) GPS did significantly 
worse; and on two oc.casions (N=245 and N=592) CM did significantly worse. 

The second, and perhaps most striking, conclusion to be drami from Table 2 
is that GPS is exceptionally fast. In all cases, CM is second fastest, the 

Levy algorithm slowest. The user, of course, has some control over the running 

time of the Levy program (but not of CM and GPS) through his specification of 
the number of resequencing attempts. 

The third conclusion to be dra\m from Table 2 is that the Levy algorithm, 
as is, requires considerably more array storage than either CM or GPS, which 
use the same data structure. In fact, for the Levy program, one problem 
(N=2680) was too big for a CDG 6400 and could not be run. Clearly, the prog: , 

could be ren^ritten to use word packing (as CM and GPS do) , but this may be a 

nontrivial task, since the programmer has to decide which arrays to pack to 
yield the best compromise between storage and CP time. (Word packing, of 
course, saves core at the expense of CP time.) 

Table 2 indicates that Levy's wavefront reduction performance was 
generally best for the smaller problems and GPS's was generally best for the 
larger problems. This is probably due to the author's choice of ten sequencing 
attempts for the Levy algorithm. As the problems get larger, the probability 
of Levy's selecting a good starting node at random goes do^ra. One can infer 
that the algorithm's performance would improve if the program were allowed to 
run longer. Hov^ever, whether the expenditure of more computer time is justi- 
fied would be a matter for each user to decide. One issue that enters into 
such a decision is the number of times a given matrix problem is to be solved. 
If a given problem is to be solved many times (as, for example, in nonlinear 
analysis), or if many right-hand sides are involved (as, for example, in time- 
dependent problems), the time spent in sequencing becomes less important. 

One might also infer that the performance of the LeAry algorithm would 
improve if trial starting nodes were selected using a strategy such as that 
used in CM or GPS, rather than at random, ^-fl-iile this may be true sometimes, it 
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was not true for the test problem on which Levy performed the worst (N=918) , 
because for this problem the first trial starting node selected by Levy (which 
uses a node of minimum degree for the first attempt) was the same starting node 
selected by GPS. This same problem (N=918) was also run by Gibbs with his 
profile algorithm (ref. 14) (which is a hybrid of GPS and King (ref. 15), the 
latter being similar to Levy) with good results. This would indicate that 
Gibbs' modification to the King numbering approach (given a starting node) has 
a significant effect for some problems. 

Overall, GPS's combination of speed and consistency probably rate it the 
best algorithm of the three for rms wavefront reduction. Previous testing 
(ref. 3) has already shown it to be an excellent algorithm for matrix bandwidth 
reduction, for which it was designed. 

Finally, the three algorithms tested were selected because of their heavy 
use by NASTRAN users. However, it would be interesting to see how other 
strategies, including Gibbs-King (ref. 14) and Snay (ref. 16), would perform 
on the same data. Both give good results for profile reduction and hence would 
probably also do well in rms wavefront reduction. 
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TABLE l.-MULTIPLY-ABD TIME CONSTANTS (T^^) 

(Double precision for IBM and Univac, single precision for CDC) 




Computer 

T (microseconds) 
m 

CDC 6400 

16 

6600 

4.3 

7600 

0.6 

Cyber 173 

8.2 

174 

8.2 

175 

1.1 

176 

0.7 

IBM 360/370 - 50 

100 

65 

20 

75 

12 

85 

2 

91,95 

1.7 

155 

25 

165 

2 

195 

0.5 

Univac 1108 

14 

1110 

4 


Source; NASTRAN level 17 block data deck NTMXBD 








TABLE 2 


Ri'iS 



WAVEFRONT TEST RESULTS 

CP TIME ISEC.) STORAGE (HOROSI 

I CH£Gf>S ! LEVY 



GPS 

LEVY 

(H/2+3) N 

6N+10E 

0, 5 

B.2 

2.7 

620 

1394 

0.6 

0.2 

1. 5 

693 

1566 

0.3 

.0.2 

1. 2 

720 

1182 

1. 5 

0.4 

6.1 

1218 

2792 

2. 8 

0.8 

13.4 

1944 

6072 

11.9 

6.6 

36. 2 

4343 

17553 

2.7 

1*6 

23.1 

2673 

7158 

6. 0 

1.3 

37. S 

3344 

892^ 

5.7 

1.5 

3 8. D 

2984 

8366 

1. 5 

0.9 

14.9 

2925 

44 0 4 

4.5 

1 . 4 : 

26.4 

343 0 

7550 

10. 7 

1.9 

73.7 

3684 

12922 

16.2 

2,2 

32.0 

4030 

12550 

18. 0 

2.7 

61.5 

5882 

16476 

11. 3 

1.8 

38.7 

4332 

15126 

19.5 

2.5 

155.1 

5866 

16234 

13.3 

2.9 

145.7 

5396 

16272 

43.3 

4.2 

294.3 

10060 

30538 

10.1 

4.5 

161. 0 

7680 

18Q22 

56. 3 

5.2 

133. 1 

8880 

26112 

37.6 

4.0 

362.5 

8802 

26262 

93. 4 

6.2 

306. 7 

9354 

30728 

132. 2 

10.4 

450.2 

12601 

37294 

45- a 

12.2 

311. 2 

10975 

33118 

95.2 

9.7 

745.7 

12852 

37838 

141.2 

34.3 

801 . a 

16363 

84712 

252- 6 

7.0 

1010.0 

21105 

44110 

42. 6 

14.6 

300.3 

12588 

43882 

124.2 

16.9 

1270.9 

16767 

53372 

342.3 

23,5 

& 

45550 

127310 


EMENT, = NOT RUN 


